\ref{sec:emtrainginformula}
\section{EM Training Formula}
\label{sec:emtrainginformula}
Training the model consists of finding the values for $t(f|e)$. If we had data
with the alignments already given we could calculate these values easily. As
described in section \ref{sec:ibmModel1} $t(f|e)$ is the
probability of generating French word $f$ from English word $e$. This would be
$t(|e) = \frac{c(e, f)}{c(e)}$, which is the amount of times English word $e$ is
aligned with French word $f$ divided by the amount of times English word $e$ is
aligned to any French word.

Unfortunately, we do not have annotated data. Hence we apply an Expectation
Maximization (EM) algorithm to calculate these values.

First we initialize the values for $t(f|e)$ randomly. Then we iterate a number
of times, or until convergence. At each iteration, for each $k^{th}$ sentence pair, we
\emph{estimate} $c(e)$ and $c(e, f)$ based on the current parameter settings and the
following formula:

\begin{equation}
\delta(k, i, j) = \frac{t(f^{(k)}_i | e^{(k)}_j)}{\sum\limits^{l}_{j=0}(f^{(k)}_i | e^{(k)}_j)}
\end{equation}

After these counts have been estimated, for all sentence pairs $t(f|e)$ is re-estimated, or \emph{maximized} using the estimated counts.

In algorithm \ref{alg:em1} the EM training algorithm for IBM model 1 is
formalized.

\begin{algorithm}[H]
\small
\begin{algorithmic}[1]
%\Until{convergence}
%\While{Stopping condition not met}
\Repeat
    \State $c(\dots) = 0$  \Comment{set all counts to 0}
    %\State $c(e, f) \gets 0$
    \For{$k \gets 1 \dots n$} \Comment{Loop over all $n$ sentences}
        \For{$i \gets 1 \dots m_k$} \Comment{Loop over French words}
            \For{$j \gets 1 \dots l_k$} \Comment{and the English words}
                \State $c(e^{(k)}_j) \gets c(e^{(k)}_j) + \delta(k, i, j)$
                \State $c(e^{(k)}_j, f^{(k)}_i) \gets c(e^{(k)}_j, f^{(k)}_i) + \delta(k, i, j)$
            \EndFor
        \EndFor
    \EndFor
    \State $t(f|e) \gets \frac{c(e, f)}{c(e)}$
%\EndWhile
\Until{convergence}
\end{algorithmic}
\caption{The EM training algorithm for IBM Model 1}
\label{alg:em1}
\end{algorithm}
